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Hercules:  A  Power  Analyzer  for  MOS  VLSI  Circuits 


Akhilesh  Tyagi 
University  of  Washington,  Seattle 


Abstract 

We  present  new  accurate  slope  based  models  for  effi¬ 
cient  computation  of  both  the  switching  and  direct  cur¬ 
rents  in  nMOS  and  CMOS  These  models,  along  with  the 
depth-first  search  of  stapes,  are  incorporated  into  a  power 
analysis  CAD  tool  Hercules.  In  addition  to  the  current 
levels  in  a  circuit.  Hercules  reports  the  worst  case  volt¬ 
age  drops  from  the  power  pin  to  the  device  drain/source 
validating  the  power  bus  design.  The  metal  segments  in 
the  power/ground  bus  having  potential  eleclromiprafion 
problem  are  flagged.  All  the  cascaded  drivers  whose  size 
ratio  deviates  significantly  from  the  ideal  ratio  baaed  on 
the  power  consumption  are  also  detected. 

0 

1  Introduction 

Within  the  past  decade,  the  CAD  community  has  developed  an 
impressive  array  of  graphics  layout  systems,  geometric  design 
rule  checkers,  circuit  extractors,  simulation  tools  and  timing  an¬ 
alyzers  to  manage  the  mass  complexity  [HJK*85]  of  the  current 
day  VLSI  digital  designs.  Unfortunately  we  do  not  yet  have  a 
CAD  tool  to  help  us  detect  the  problems  associated  with  the 
excessive  power  consumption.  We  have  reached  a  point  where 
we  have  a  technology,  gallium  arsenide,  that  is  primarily  limited 
by  its  power  requirements.  A  gallium  arsenide  gate  at  peak  per¬ 
formance  level,  can  dissipate  between  lmW  to  5m W  of  power. 
Even  the  domino  GaAs  circuits  reported  [YCL87]  have  the  power 
requirements  upwards  of0.55mW/gate.  Examples  of  nMOS  sys¬ 
tems,  whose  cost  is  determined  primarily  by  the  power  supplies 
and  the  cooling  peripherals,  abound  in  the  literature.  There  are 
cases  when  a  CMOS  circuit  will  not  function  correctly  because 
the  effect  of  excessive  voltage  drops  in  the  power  bus  was  not 
taken  into  account.  These  are  precisely  the  reasons  why  we  felt 
a  definite  need  for  a  CAD  tool  that  can  validate  a  design  with 
respect  to  the  power  consumption. 

We  know  of  two  programs  that  estimate  power  consumption 
of  nMOS  circuits  only.  The  Berkeley  tools  included  a  power 
estimator  called  powest  for  nMOS  [Cme82|.  It  simply  counts  the 
number  of  pull-up  load  devices  (e.y.  depletion  load).  For  each 
type  of  pull-up  device,  e.g.  depletion  mode  n-channel,  a  fixed 
number  (user  modifiable)  is  used  to  represent  the  fraction  of 
time  it  would  usually  be  on.  The  product  of  number  of  dev.ces 
and  this  fraction  is  reported  as  the  power  consumption.  This 
information  has  a  very  limited  use.  More  recently,  Jeff  Wilson 
[Wil85]  developed  pwranal,  which  is  a  more  sophisticated  tool. 
It  retains  the  topology  of  the  power  bus  as  a  metal  tree,  puiranal 


uses  the  metal  tree  representation  to  report  the  pin  to  device 
terminal  voltage  drops.  However,  pwranal  can  deal  only  with  a 
single  power  pin  and  one  layer  metal.  Moreover  it  breaks  the 
metal  bus  structure  loops  arbitrarily,  which  can  give  rise  to  very 
pessimistic  numbers  while  dealing  with  a  commonly  encountered 
form  ol  a  loop  in  an  array,  a  comb.  Unfortunately,  the  load 
current  estimation  algorithms  are  only  a  small  improvement  over 
powest. 

The  two  main  problems  a  power  analysis  tool  needs  to  ad¬ 
dress  are  the  current  Sow  estimation  and  the  power  bus  evalu¬ 
ation.  We  propose  the  slope  based  models  for  calculating  the 
switching  current  levels.  Ousterhout  (Ous84)  has  used  slope 
based  models  for  computing  the  effective  resistances  of  MOS 
devices  in  the  delay  estimator  crystal  However,  in  the  CMOS 
case,  estimating  the  load  switching  current  is  not  the  end  of  the 
story.  Due  to  slow  rising  input  transitions,  there  could  be  a  low 
impedance  direct  path  between  Vdd  and  Gnd  for  a  considerable 
duration.  Veendrick  [Vee84]  observes  that  this  direct  current 
component  in  CMOS  inverters  can  be  as  high  as  the  load  cur¬ 
rent.  Thus  any  system  estimating  the  current  flow  in  a  CMOS 
circuit  ought  to  account  for  both  the  load  current  and  the  di¬ 
rect  current.  We  model  the  total  charge  flow  due  to  the  short 
circuit  direct  current  component  in  the  slope  framework.  This 
gives  us  a  very  accurate  model  of  the  total  switching  current. 
We  observe  at  the  outset  that  our  techniques  and  models  are 
of  a  very  general  nature.  They  can  be  easily  specialized  to  any 
enhancement  or  depletion  based  technology  like  nMOS,  CMOS 
or  GaAs.  We  have  used  these  models  in  Hercules,  a  prototype 
program  written  in  C  running  on  a  VAX-H/780. 

As  we  mentioned  earlier,  the  other  problem  is  that  of  esti¬ 
mating  the  voltage  drops  in  the  power  bus  In  that  direction,  we 
extend  the  metal  tree  idea  of  Wilson  [  Wil85)  to  accommodate  cir¬ 
cuits  with  multiple  power  pins,  multilayer  metal  bus  and  loops. 
It  is  not  very  often  that  a  circuit  designer  routes  power  with 
loops  in  the  power  distribution  metal  bus  structure.  Nonethe¬ 
less,  we  observed  that  in  most  of  the  cases  when  a  loop  is  in¬ 
troduced,  it  is  a  comb  loop  (described  later).  Most  of  the  array 
constructs  tend  to  use  it.  Our  aim  was  to  analyze  these  loops 
efficiently.  We  do  not  yet  have  an  efficient  method  of  dealing 
with  a  general  kind  of  loop. 

The  innards  of  Hercules  consist  of  current  estimation  and 
metal  tree  evaluation  algorithms,  which  are  conceptually  inde¬ 
pendent  problems.  For  this  reason,  we  describe  them  in  different 
sections.  We  explain  the  current  estimation  algorithms  in  Sec¬ 
tion  2  followed  by  the  power  tree  evaluation  algorithms  in 
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Section  3. 

1  Current  Estimation  Algorithm 

We  wanted  to  be  able  to  model  the  current  flow  as  an  in¬ 
stance  of  max  flow  problem  with  charge  being  the  commod¬ 
ity  pushed  through  the  network.  The  switch  model  provides 
the  ideal  starting  point  for  this  idea.  Each  switch  can  be 
thought  of  as  a  node  in  the  commodity  network  with  the 
switch  capacity  (transistor  width)  and  switch  status  (on  or 
off)  determining  the  network  node  capacity.  Applying  this 
idea  to  the  whole  circuit  can  still  be  expensive.  For  an  exact 
analysis,  we  need  to  determine  the  state  of  all  the  switches  ir 
the  network  to  work  out  the  node  capacities.  This  involves 
a  complete  simulation:  of  the  circuit  for  all  2"  input  vectors 
(for  an  n  input  circuit),  to  determine  the  worst  case.  Thus, 
we  need  to  take  a  value-independent  approach  to  control 
the  problem  complexity,  like  the  timing  verifiers  TV  [Jou83] 
and  Crystal  [Ous85]  do.  The  stage  decomposition  provides 
just  the  right  abstraction  for  an  approximate  analysis.  A 
stage  is  a  chain  of  transistors  leading  from  a  strong  volt¬ 
age  source  (like  Vdd,  Ground,  input  or  a  highly  capacitive 
bus )  to  an  output  or  a  gate.  Typically  a  stage  consists  of 
a  logic  gate  along  with  all  the  pass  transistors  following  it. 
Refer  to  Figure  1  for  an  example  of  a  stage.  Note  that  a 
chain  of  transistor  channels  forms  an  electrical  path  from 
a  voltage  source  to  the  output  of  a  stage.  Thus,  the  flow 
of  charge  through  a  stage  is  conserved.  One  transistor  in 

The  stage  activated  by  signal  ml  going  from  0  -*  1  is  shown 
by  the  thick  line.  It  starts  with  the  n-channel  connecting  to 
ini,  followed  by  two  pass  transistors  A  and  B,  terminating 
in  the  gate  C. 


Figure  1:  An  Example  of  a  Stage 

a  stage  is  distinguished  as  the  trigger.  It  is  the  transistor 
directly  driven  by  the  signal,  e.g.,  for  the  stage  depicted  in 
Figure  1  the  n-type  transistor  driven  by  ini  is  the  trigger. 
All  the  other  transistors  in  a  stage  are  assumed  to  be  fully 
on,  unless  otherwise  specified  by  the  user.  This  gives  rise  to 
a  worst  case  estimate. 

As  we  had  mentioned  earlier,  we  need  to  be  able  to  es¬ 
timate  the  short-circuit  direct  current  to  analyze  a  CMOS 
circuit  accurately.  Thus  the  total  current  flow  has  two  com¬ 


ponents.  Both  these  components,  load  and  direct,  need  to 
be  calculated.  VVe  compute  both  the  load  current  and  the 
direct  current  at  the  stage  level.  We  also  distinguish  be¬ 
tween  the  average  and  peak  current  levels.  The  average 
current  is  total  current  averaged  over  a  clock  period.  The 
average  figure  should  be  used  to  determine  if  a  circuit  is 
prone  to  such  statistical  phenomenon  as  metal  migration 
The  peak  current  is  the  maximum  level  of  current  during  a 
clock  period.  The  peak  current  figure  is  used  in  estimating 
the  worst  case  voltage  drops  in  the  power  bus. 

Before  Hercules  can  be  instructed  to  do  power  analy¬ 
sis,  it  has  to  read  the  circuit  to  be  analyzed  in  C  IF  form. 
It  builds  up  a  metal  tree  data  structure  (described  in  Sec¬ 
tion  3)  out  of  CIF  description.  A  user  can  pose  a  question 
such  as,  “  What  are  the  repercussions  of  clock  <S>\  going  from 
0  — <  1  ?  Which  metal  wire  segments  are  susceptible  to  elec 
immigration?" .  The  stages  are  traced  out  in  a  depth  first 
manner.  Hercules  finds  all  the  nodes  that  can  be  affected 
by  this  clock  transition.  These  nodes  are  the  basis  of  the 
initial  set  of  stages.  Eacn  of  these  stages  is  handled  in  turn. 
For  each  stage,  the  average  and  peak  load  and  direct  current 
calculations  are  performed  which  associates  a  current  flow 
value  with  each  node  in  the  stage.  The  output  node  of  each 
of  these  stages  can  activate  another  set  of  stages  This  set 
of  children  stages  is  visited  next. 


1.1  The  Load  Current  Estimation 

In  a  stage,  every  transistor  other  than  the  trigger  is  as¬ 
sumed  to  be  fully  on.  Thus  their  current  flow  could  be  as 
high  as  their  saturation  current.  The  trigger  transistor,  on 
the  other  hand,  is  going  through  an  input  transition.  The 
rise  time  of  the  input  signal,  the  gain  fl  of  the  device,  and 
the  load  determine  the  total  amount  of  current  flow.  The 
key  to  an  efficient  implementation  of  this  model  is  that  all 
these  factors  can  be  combined  into  a  single  number  called 
rise  time  ratio,  which  is  the  input  signal  rise  time  divided  by 
the  native  rise  time.  Native  rise  time  is  the  rise  time  of  the 
output  node  when  a  step  input  is  applied.  The  information 
about  the  load  current  can  be  extracted  from  SPICE  output 
on  a  device  with  different  rise  time  ratio  values.  This  can 
be  stored  in  a  table  indexed  by  the  rise  time  ratio.  Her¬ 
cules  uses  this  table  to  estimate  the  current  flow  through  a 
stage.  Circuit  designers  do  not  use  the  whole  continuum  of 
the  device  sizes  available  to  them.  In  practice,  people  tend 
to  use  only  a  fixed  set  of  sizes  in  a  given  circuit.  Thus  it 
is  very  feasible  to  build  such  a  table,  and  fortunately  it  is 
also  compact.  For  the  rise  time  ratios  exceeding  the  table 
bounds,  extrapolation  is  used.  Thus  the  current  modeling 
for  the  trigger  device  costs  one  table  lookup  and  some  arith¬ 
metic.  This  approach  was  first  used  by  Ousterhout  [Ous84| 
to  model  device  resistances. 

At  this  point,  we  know  the  current  flow  capacities  of 
every  transistor  in  the  stage.  The  peak  load  current  through 
a  stage  is  the  minimum  of  the  current  flow  capacities  of 
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all  the  transistors  in  the  stage.  The  average  charge  flow 
through  a  stage  is  the  product  of  total  load  capacitance  and 
voltage  swing.  The  average  load  current  through  a  stage 
is  estimated  by  dividing  the  average  charge  by  the  delay 
through  that  stage.  The  side  paths  from  a  stage  are  assumed 
to  be  off.  In  practice,  in  a  structure  like  a  pass  transistor 
array  (barrel  shifter),  not  many  such  paths  originating  at 
the  same  node  seem  to  be  active  simultaneously. 

1.2  Direct  Current  Models 

In  CMOS  static  logic  gates,  there  is  a  direct  (short-circuit) 
current  component  when  the  input  signal  switches.  Clearly 
the  RC  model  would  not  be  sufficient  to  model  this  com¬ 
ponent,  because  it  assumes  step  function  input  signals.  We 
noticed  that  the  total  charge  that  flows  from  Vdd  to  Gnd 
due  to  direct  current  is  a  function  only  of  the  rtse  lime  ra¬ 
tio.  Contrast  it  with  the  load  current  model,  where  it  is 
the  load  current,  rather  than  charge,  that  is  a  function  of 
the  rise  time  ratio.  Once  again,  the  information  about  di¬ 
rect  charge  flow  can  be  extracted  from  many  output  decks 
of  SPICE  runs  on  a  device  with  a  range  of  rise  time  ratio 
values.  Two  tables  indexed  by  the  rise  time  ratio  can  be 
built,  one  each  for  the  average  and  peak  direct  charge  lev¬ 
els.  Hercules  can  then  use  these  tables  to  model  the  average 
and  peak  direct  current  flow  through  a  stage  with  an  invert¬ 
ing  structure.  The  stage  building  algorithm  tags  each  stage 
that  contains  a  chain  of  p- type  and  n-type  devices  between 


minimized.  Veendrick  also  makes  the  same  observation  in 
(Vee84j  and  uses  this  fact  to  design  buffers  optimal  with  re¬ 
spect  to  power  consumption.  The  peak  and  average  direct 
currents  are  maximum  when  there  is  no  load.  Both  of  them 
tend  to  decrease  with  increasing  load.  The  reason  for  it  is 
that  a  part  of  the  direct  current  goes  into  driving  the  load 
With  a  higher  load,  the  amount  of  excess  charge  available  to 
flow  from  Vdd  to  ground  is  less.  With  rise  time  ratio  equal 
to  one,  both  the  input  and  output  signal  speeds  are  bal¬ 
anced.  If  the  input  is  driven  at  a  slower  rate  than  this,  then 
the  time  during  which  the  inverter  is  in  short-circuit  mode 
(Km»  -  <  K,.  <  Vm«  +  K*,)  increases.  This  gives  rise  to 

higher  direct  charge  flow  as  in  Figure  2.  On  the  other  hand, 
if  the  input  signal  is  faster  than  this,  then  the  toad  cannot 
absorb  all  of  the  direct  current.  Thus,  there  is  more  of  the 
excess  charge  available  to  flow  between  Vdd  and  ground.  In 
any  case,  Veendrick  remarks  that  for  a  rise  time  ratio  of  one. 
the  short-circuit  power  dissipation  would  be  less  than  20S^ 
of  the  total  power  dissipation.  Now  we  have  a  quantitative 
measure  of  what  it  means  to  have  a  good  driving  ratio  be 
tween  the  driver  size  and  the  load.  We  can  easily  identify 
all  the  inverters  (stages)  with  a  rise  time  ratio  within  6  of 
1,  for  a  user  specified  S.  The  user  can  choose  a  S  based  on 
how  finely  tuned  a  system  (s)he  wants  and  Hercules  will  flag 
all  the  stages  outside  this  bound.  Note  that  Hercules  com¬ 
putes  the  rise  time  ratio  for  each  stage,  anyway,  to  model 
the  current  flow. 


Figure  3:  An  Example  of  a  Comb  Structure 
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Figure  2:  A  Plot  of  Total  Direct  Charge  Flow  Vs.  Rise  Time 
Ratio 


Vdd  and  Gnd  with  a  common  input.  To  recapitulate,  this 
technique  works  for  the  following  reason.  Most  of  the  cir¬ 
cuits  ue  designed  with  few  combinations  of  transistor  and 
load  sizes.  SPICE  runs  on  small  pieces  of  circuits  give  al¬ 
most  all  the  information  about  the  operating  environment 
of  these  small  circuits  within  a  very  large  circuit. 

Figure  2  shows  the  dependence  of  total  charge  flow  due 
to  direct  current  on  the  rise  time  ratio  for  n-channel  and 
p-channel  devices.  Note  that  the  charge  flow  is  directly 
proportional  to  the  power  consumption  due  to  direct  cur¬ 
rent.  For  the  rise  time  ratio  of  1,  the  total  charge  flow  is 


2  Metal  Bus  Data  Structure 


Distributing  power  and  ground  with  the  least  resistance  is 
one  of  the  major  concerns  in  a  design.  Many  a  time,  a  circuit 
fails  to  perform  because  the  drain  terminals  receive  a  voltage 
level  significantly  lower  than  Vdd  and  the  source  terminals 
are  way  above  ground ,  although  the  external  power  supply 
is  functioning  properly.  What  can  be  worse  is  that  different 
subsystems  might  be  receiving  different  operating  voltages. 


How  should  the  loops  in  power/gnd  structure  be  han¬ 
dled?  If  the  loops  are  broken  at  an  arbitrary  edge,  there 
could  be  a  large  error  in  the  power  bus  resistance  calcula¬ 
tion.  We  observed  that  it  is  very  seldom  that  a  designer 
routes  power  with  looping  structures.  However  the  pattern 
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shown  in  Figure  3  is  used  quite  often  to  feed  power  to  an 
array  of  cells.  The  bottom  line  introducing  the  loops  re¬ 
inforces  the  distribution.  Hercules  attempts  to  do  a  good 
job  with  comb  loop  structures.  We  have  an  exact  and  effi¬ 
cient  (linear  time)  algorithm  for  analyzing  a  power  bus  with 
comb  loops.  For  the  details  of  the  Metal  Tree  Data  Struc¬ 
ture  (MTDS),  the  reader  is  referred  to  the  author's  thesis 
[Tya87], 

3  Performance 

Both  the  short-circuit  current  an<  load  current  models  were 
validated  with  p'esently  operat  onal  version  of  Hercules. 
Hercules  was  ru.i  on  a  control  1  LA  and  a  register  file  for 
a  32-bit  microprocessor,  QuarterHorse  jHJK*85),  designed 
at  the  University  of  Washington.  The  control  PLA  has  19 
inputs,  68  outputs  and  93  implicants.  The  register  file  has 
32,  32-bit  registers  designed  in  the  cross-coupled  static  style. 
Hercules  was  also  tried  on  some  3  stage  NAND  and  NOR 
networks.  The  results  are  encouraging.  For  all  the  cases,  the 
average  load  current  and  the  average  direct  current  reported 
are  within  25%  of  the  SPICE  reported  numbers.  However 
the  peak  load  current  was  overestimated  by  as  much  as 
100%.  Recall  that  the  peak  load  current  is  the  minimum 
of  the  peak  currents  of  all  the  transistors  in  a  stage.  The 
peak  direct  current  is  calculated  by  dividing  the  total  charge 
flow  by  the  time  input  signal  takes  to  go  from  Vj„„  —  V,*. 
to  V1B„  +  Vi*,.  This  number  was  also  found  to  be  off  by  as 
much  as  80%  on  the  higher  side.  We  believe  that  we  can 
do  better  than  this  with  the  peak  load  current  calculation. 
We  still  do  not  know  how  to  better  predict  the  peak  direct 
current. 
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